Sports Analytics & Catcher Framing in Baseball

Jim Albert, Emeritus Professor, BGSU

2024-03-01

Introduction

My Background

  • In Dept of Mathematics and Statistics at BGSU for 41 years
  • Interests include Bayesian modeling and statistical thinking in sports
  • Written books, several on baseball and statistics
  • Helped start a BGSU undergraduate program in Data Science

Book: Analyzing Baseball Data with R

  • Book provides first steps towards a professional career in baseball analytics

  • Where do you find baseball data?

  • What are useful R tools for exploring this data?

  • What are examples of baseball research?

Analyzing Baseball Data with R

  • 3rd edition (with Max Marchi and Ben Baumer) available in paperbook this summer

  • Online version available for free at http://tinyurl.com/abdwr3e

  • Chapters on sources of baseball data, sabermetrics, and coding using R

  • New chapters (on handling big data, Shiny and composing with Quarto)

Book Blog

  • baseballwithr.wordpress.com
  • Started in Fall of 2013
  • Posts on topics in sabermetrics, modeling and R

Sports Analytics

What is Sports Analytics

  • Using data to measure performance and facilitate decision making in sports

  • Baseball was one of the first sports to use data to address questions (sabermetrics)

  • Sports analytics is now applied in many sports

Sabermetrics - Some History

  • Bill James defined sabermetrics in 1980 as “the search for objective knowledge about baseball”

  • Moneyball (the book by Michael Lewis and the movie) describe the use of sabermetics by the Oakland Athletics in the 2002 season

  • All 30 MLB teams currently have analytics departments

Sources of Baseball Data

  • Lahman database gives season to season data for all teams and players in baseball history

  • Retrosheet has game by game and play by play data for many season

  • Statcast is newest source of data – have data about pitches, balls put into play and player locations

  • Practically all of this data is publicly available

Sabermetrics Problems

  • How to intelligently draft, sign and trade players?

  • How to measure performance?

  • How to use players in a game? (Defensive positioning, relief pitching)

Sports Analytics is an Academic Discipline

  • Journals such as the Journal of Quantitative Analysis of Sports and the Journal of Sports Analytics

  • Conferences: Saberseminar, Carnegie-Mellon Sports Analytics Conference and the New England Symposium on Statistics in Sports

  • Blogs (Baseball Prospectus, FanGraphs, etc)

Are There Jobs in Sports Analytics?

  • YES!

  • Practically all professional sports teams have analytics groups

  • I know people working for MLB, NFL, NHL teams

  • Companies such as Zelus Analytics “provide sports intelligence as a service to professional teams”

Catcher Framing

Measuring Player Performance

  • What makes a good catcher?
  • How do you measure performance?
  • How important is this measurement?
  • Does it predict future performance?

Catcher Framing

  • Ability to catch a ball so it appears to be a strike
  • How do you measure this?
  • How do you adjust for other variables?
  • Does it matter?

1943 Dodger Way to Play Baseball

“The good receiver often makes many doubtful strikes pitches by catching the ball properly. This is not done by jerking or pulling the gall over the plate. Instead it is done by bringing all close pitches towards the belt buckle if they are just inside or outside of home plate … The entire active must be smooth if the umpire is to be deceived.”

From Power Ball: Anatomy of a Modern Baseball Game

The Count

  • Start at a 0-0 count (0 balls and 0 strikes)

  • Every pitch adds a strike or a ball

  • Possible counts: 0-0, 1-0, 0-1, 2-0, 1-1, 0-2, 3-0, 2-1, 1-2, 3-1, 2-2, 3-2

Types of Counts

  • Three types of counts: Pitcher (like 1-2), Batter (like 3-1), and neutral (like 1-1)

  • Outcome of every pitch in a plate appearance gives an advantage to the pitcher or the batter

  • How do you measure this advantage?

Runs Expectancy

  • State of an inning defined by the number of outs and the runners on base

  • There are 3 \(\times\) 8 = 24 possible inning states.

  • For each state, define “Runs Expectancy” to be the expected number of runs scored in the remainder of the inning.

  • Compute using data for a particular season

Runs Expectancy Matrix (RE24)

Value of a Play

  • Measure by the Runs Expectancy Matrix

  • Look at the Runs in the “before” and “after” states

  • Runs Value = \(RE24_{after} - RE24_{before} + Runs \, Scored\)

Value of Home Run with Runner on 1st and One Out

  • Value of Starting State: 0.50

  • Value of Ending State: 0.25

  • Two runs scored on play

  • Value of Home Run is \[Value = 0.25 - 0.50 + 2 = 1.75 \, \, Runs\]

Value of Stolen Base with Runner on 1st and One Out

  • Value of Starting State: 0.50

  • Value of Ending State: 0.66

  • No runs scored on play

  • Value of Stolen Base is\[ Value = 0.66 - 0.50 + 0 = 0.16 \, \, Runs\]

Runs Value of Pitches

  • Runs value of 1-2 count?

  • Look at all plate appearances that pass through a 1-2 count

  • Average the runs in the remainder of the inning for all these plate appearances

  • Repeat this process for all possible counts, and graph

Runs Value of Pitches

Runs to Wins

  • Bill James found an empirical relationship between R / RA and W / L

  • Pythagorean formula\[ \frac{W}{L} = \left(\frac{R}{RA}\right)^k \]

  • A contribution of 10 more runs is equivalent to contributing one win for the team

Value of a Strike

  • Each additional strike contributes runs to the defensive team

  • Each contribution is small, but the cumulative effect of many added strikes is large

  • Convert the runs contributed to wins

Called Balls and Strikes

  • Pitches are thrown towards a “strike zone”

  • Pitches where the batter doesn’t swing are called “strikes” or “balls” by the umpire

  • Pitches landing inside zone should be called strikes

Average Strike Zone

Results of 1000 Called Pitches (Red = Strike, Blue = Ball)

Locations of 500 Missed Calls

What Influences the Called Pitch?

  • The umpire

  • The batter

  • The pitcher

  • The catcher

  • Other influences?

Actual Strike Zone

  • Data: all called pitches in 2016 season

  • Response: \(y\) (1 or 0) (Strike or Ball)

  • Input: (platex, platez) - location of pitch

  • Let \(p = P(y = 1) = Prob(Strike)\)

Model

  • Fit a generalized additive model\[ \log \left(\frac{p}{1-p}\right) = s(platex, platez) \] where \(s()\) is a smooth function of the location variables

  • Actual strike zone is defined where \[p = P(Strike) = 0.5\]

Example - Count Effects

  • Location of the actual strike zone depends on the count

  • Look at actual zone at a 0-0 (Neutral) count

  • Compare with the actual zone on a 0-2 (Pitchers) count

Actual Strike Zone on 0-0 Count

Actual Zone on 0-2 Count

Catcher Framing

  • Catcher can influence the called pitch

  • Subtle way the ball is caught

  • How do you measure it?

  • How big an effect is it?

Multilevel Model

  • Outcome - called pitch (strike or ball)

  • Inputs:

  1. Location (platex, platez)
  2. Pitcher effect
  3. Batter effect
  4. Umpire effet
  5. Catcher effect
  • Measure catcher effect adjusting for other variables

Model

  • Generalized additive model\[ \log \left(\frac{p}{1-p}\right) = s(platex, platez) + p_{j(i)} + b_{k(i)} + u_l(i) + ca_{m(i)} \]

  • Each set of random effects assigned normal prior with unknown standard deviation

  • Catcher framing estimates are {\(ca_j\)}

  • Convert these to strikes added and runs saved

2023 Framing Leaders from Baseball Savant

A Good Catcher Framer

  • Gives the defensive team more called strikes
  • Each called strike saves about 0.03 runs per called strike
  • Best framers save 10-20 runs scored for their teams

Catcher Framing

  • New measure of performance
  • Managers knew it existed, but how to measure?
  • Available due to the new pitch tracking data
  • Really is an important skill of a catcher

Getting Started

Getting Started in Sports Analytics

  • Data science coursework (learning R, exploring data, modeling)

  • Get connected with a college sports team (baseball, basketball, soccer, volleyball)

  • Work on small projects and publicize your work (on a blog)

Tips for Getting Job in Sports Analytics

  • People working in field come from wide variety of academic background but data science is a great background

  • Teams want people who are able to pose good questions and follow an analysis from beginning to end

  • Communication skills important

  • Get started through an internship for a sports team (Saberseminar)

Questions?